Ford GoBike System Data

by (Hany Y Wasef)

Table of Contents

Preliminary Wrangling

In August 2013 the Bay Area Bike Share system began operating in the San Francisco Bay Area of California. Customers can go to a dock (bike station), take the bike after having unlocked it through the App, and then they can leave the bicycle in one of the stations that are based around the city. The system allocated half of its 700 bicycle fleet in San Francisco, In 2015, it was announced that the scheme would expand to 7,000 bikes, over 2016–2017, it became Ford GoBike because of the partnership with Ford Motor Company. wikipedia.org/wiki
In this project I will focus on information about individual rides made in a bike-sharing system covering the greater San Francisco Bay area (In February 2019).Fordgobike-Tripdata/201902

Loading Liberaries

Gathering and Assessing Data

There are 183412 individual rides made in bike-share system with 16 features in this data.need to clean and tidy this data

Cleaning Data

Delete unrequired columns

Missing values

convert data types

Fix outlier

create new columns

finaly save clean data

What is the structure of your dataset?

In my new data there are 174952 individual rides made in bike-share system with 18 features that represent:

What is/are the main feature(s) of interest in your dataset?

What features in the dataset do you think will help support your investigation into your feature(s) of interest?

I expect that the days of week and the hours of day [start_day, start_hour] will have effect on the duration of trip. I also think that the user info [user_type, member_gender, member_age] will help find out the main target users.

Univariate Exploration

First I will start by looking for the main feature of interst [duration in minute]

we can observe that the trip duration has a long-tailed distribution skewed to right, where the long time(more than 30 minutes) has few trips, and more than 90% of the trips have less than 1 hour long.

How long does the average trip take in minutes?

we can easily interpret from the above plot, that a majority of users have a tendency towards using the bikes for a short-time duration trip.Most trips take between 4 to 15 minutes

When are most trips taken in weekdays?

From above figure we can see the most trips were taken on thursday while the least on saturday and sunday.

When are most trips taken in terms of time of day?

We can see from above figure the most trips were taken at 5 PM and 8 AM, while the number of trips decreases after midnight.

What is the most common type of bike share user? Is Male members more than females use the bike share?

we noticed from previous figures the following :

What is the average of members age?

According to previous figure, most members are around 25 to 45 years old.

Discuss the distribution(s) of your variable(s) of interest. Were there any unusual points? Did you need to perform any transformations?

I can observe that the trip duration in mintutes has a long-tailed distribution skewed to right, where the long time(more than 30 minutes) has few trips, and more than 90% of the trips have less than 1 hour long. so I use scale transformation. now I can easily interpret that a majority of users have a tendency towards using the bikes for a short-time duration trip.Most trips take between 4 to 15 minutes

Of the features you investigated, were there any unusual distributions? Did you perform any operations on the data to tidy, adjust, or change the form of the data? If so, why did you do this?

From other plots, I observed that:

Bivariate Exploration

How long does the average trip take for each day of the week?

From figure above, we observed that all week days have short trips, except weekend (saturday and sunday) has long trips.

How long is the trip in minutes per hour during the day?

I observed that most hours of the day have short trips except 2 am and 3 am have long trips.

What is the trip duration in minutes according to the type of user, and the gender of members?

We observed:

Is there a relationship between the age of the member and the duration of the trip?

We observed that the relation between age and trip duration is negative as expected, as when the age increases, the duration of the trips decreases.

When are most trips taken hourly during the day depending on the type of user?

Subscriber of bike share clearly peaks out on typical rush hours when people go to work in the morning and getting off work in the afternoon, while customer of bike share tend to ride most in the afternoon or early evening. It is clear that the number of trips for subscribers is twice the number of trips for the customer during the day.

When are most trips taken for each day of the week depending on the type of user?

As before, There was much more subscriber usage than casual customers overall. We note that the subscriber has more bike trips most days of the week except on weekends (Saturday, Sunday), while customers have bike trips every day of the week including weekend (Saturday, Sunday).

Do male members use more bikes than females by user type?

The Male Subscriber more than female, and also the male Customer more than the Female.

Talk about some of the relationships you observed in this part of the investigation. How did the feature(s) of interest vary with other features in the dataset?

I got some observations for feature of interest with other features:

Did you observe any interesting relationships between the other features (not the main feature(s) of interest)?

I got some observations for ralationships between the other features:

Multivariate Exploration

How long does the average trip take for each day of the week depending on the type of user?

From the chart above it appears that subscribers are riding much shorter rides compared to customers every day of the week. Both types of users have a clear increase in trip duration on Saturdays and Sundays during the weekends, especially for regular customers. The use of subscribers appears to be more efficient than clients in general and has maintained a very constant average duration from Monday to Friday.

When are most trips taken hourly during the day for each day depending on the type of user?

The previous Heatmaps clearly show a very different usage pattern of use between the two types of users:

When are most trips taken hourly during the day for each day depending on the type of user and the gender of member?

I observed from previous plot:

Talk about some of the relationships you observed in this part of the investigation. Were there features that strengthened each other in terms of looking at your feature(s) of interest?

Multivariate exploration reinforced some of the patterns detected in previous bivariate as well as univariate exploration.

Were there any interesting or surprising interactions between features?

I think there is no big surprise observed here. all the interactions between the features complement each other and make perfect sense.